21 research outputs found
Significance-Based Categorical Data Clustering
Although numerous algorithms have been proposed to solve the categorical data
clustering problem, how to access the statistical significance of a set of
categorical clusters remains unaddressed. To fulfill this void, we employ the
likelihood ratio test to derive a test statistic that can serve as a
significance-based objective function in categorical data clustering.
Consequently, a new clustering algorithm is proposed in which the
significance-based objective function is optimized via a Monte Carlo search
procedure. As a by-product, we can further calculate an empirical -value to
assess the statistical significance of a set of clusters and develop an
improved gap statistic for estimating the cluster number. Extensive
experimental studies suggest that our method is able to achieve comparable
performance to state-of-the-art categorical data clustering algorithms.
Moreover, the effectiveness of such a significance-based formulation on
statistical cluster validation and cluster number estimation is demonstrated
through comprehensive empirical results.Comment: 36 pages, 6 figure
A testing-based approach to assess the clusterability of categorical data
The objective of clusterability evaluation is to check whether a clustering
structure exists within the data set. As a crucial yet often-overlooked issue
in cluster analysis, it is essential to conduct such a test before applying any
clustering algorithm. If a data set is unclusterable, any subsequent clustering
analysis would not yield valid results. Despite its importance, the majority of
existing studies focus on numerical data, leaving the clusterability evaluation
issue for categorical data as an open problem. Here we present TestCat, a
testing-based approach to assess the clusterability of categorical data in
terms of an analytical -value. The key idea underlying TestCat is that
clusterable categorical data possess many strongly correlated attribute pairs
and hence the sum of chi-squared statistics of all attribute pairs is employed
as the test statistic for -value calculation. We apply our method to a set
of benchmark categorical data sets, showing that TestCat outperforms those
solutions based on existing clusterability evaluation methods for numeric data.
To the best of our knowledge, our work provides the first way to effectively
recognize the clusterability of categorical data in a statistically sound
manner.Comment: 19 pages, 13 figure
Interpretable Sequence Clustering
Categorical sequence clustering plays a crucial role in various fields, but
the lack of interpretability in cluster assignments poses significant
challenges. Sequences inherently lack explicit features, and existing sequence
clustering algorithms heavily rely on complex representations, making it
difficult to explain their results. To address this issue, we propose a method
called Interpretable Sequence Clustering Tree (ISCT), which combines sequential
patterns with a concise and interpretable tree structure. ISCT leverages k-1
patterns to generate k leaf nodes, corresponding to k clusters, which provides
an intuitive explanation on how each cluster is formed. More precisely, ISCT
first projects sequences into random subspaces and then utilizes the k-means
algorithm to obtain high-quality initial cluster assignments. Subsequently, it
constructs a pattern-based decision tree using a boosting-based construction
strategy in which sequences are re-projected and re-clustered at each node
before mining the top-1 discriminative splitting pattern. Experimental results
on 14 real-world data sets demonstrate that our proposed method provides an
interpretable tree structure while delivering fast and accurate cluster
assignments.Comment: 11 pages, 6 figure
AdaBrowse: Adaptive Video Browser for Efficient Continuous Sign Language Recognition
Raw videos have been proven to own considerable feature redundancy where in
many cases only a portion of frames can already meet the requirements for
accurate recognition. In this paper, we are interested in whether such
redundancy can be effectively leveraged to facilitate efficient inference in
continuous sign language recognition (CSLR). We propose a novel adaptive model
(AdaBrowse) to dynamically select a most informative subsequence from input
video sequences by modelling this problem as a sequential decision task. In
specific, we first utilize a lightweight network to quickly scan input videos
to extract coarse features. Then these features are fed into a policy network
to intelligently select a subsequence to process. The corresponding subsequence
is finally inferred by a normal CSLR model for sentence prediction. As only a
portion of frames are processed in this procedure, the total computations can
be considerably saved. Besides temporal redundancy, we are also interested in
whether the inherent spatial redundancy can be seamlessly integrated together
to achieve further efficiency, i.e., dynamically selecting a lowest input
resolution for each sample, whose model is referred to as AdaBrowse+. Extensive
experimental results on four large-scale CSLR datasets, i.e., PHOENIX14,
PHOENIX14-T, CSL-Daily and CSL, demonstrate the effectiveness of AdaBrowse and
AdaBrowse+ by achieving comparable accuracy with state-of-the-art methods with
1.44 throughput and 2.12 fewer FLOPs. Comparisons with other
commonly-used 2D CNNs and adaptive efficient methods verify the effectiveness
of AdaBrowse. Code is available at
\url{https://github.com/hulianyuyy/AdaBrowse}.Comment: ACMMM202
Self-Emphasizing Network for Continuous Sign Language Recognition
Hand and face play an important role in expressing sign language. Their features are usually especially leveraged to improve system performance. However, to effectively extract visual representations and capture trajectories for hands and face, previous methods always come at high computations with increased training complexity. They usually employ extra heavy pose-estimation networks to locate human body keypoints or rely on additional pre-extracted heatmaps for supervision. To relieve this problem, we propose a self-emphasizing network (SEN) to emphasize informative spatial regions in a self-motivated way, with few extra computations and without additional expensive supervision. Specifically, SEN first employs a lightweight subnetwork to incorporate local spatial-temporal features to identify informative regions, and then dynamically augment original features via attention maps. It's also observed that not all frames contribute equally to recognition. We present a temporal self-emphasizing module to adaptively emphasize those discriminative frames and suppress redundant ones. A comprehensive comparison with previous methods equipped with hand and face features demonstrates the superiority of our method, even though they always require huge computations and rely on expensive extra supervision. Remarkably, with few extra computations, SEN achieves new state-of-the-art accuracy on four large-scale datasets, PHOENIX14, PHOENIX14-T, CSL-Daily, and CSL. Visualizations verify the effects of SEN on emphasizing informative spatial and temporal features. Code is available at https://github.com/hulianyuyy/SEN_CSL
Mapping of quantitative trait loci for growth traits around the first overwintering period in Songpu mirror carp (Cyprinus carpio L.) cultured in Northeast China
QTL mapping studies of growth traits based on high-density linkage maps in Northeast China has great significance for local enterprises in achieving MAS and improving the selection accuracy of parent fish. Here, we constructed a high-density genetic linkage map with 11,445 single nucleotide polymorphism (SNP) markers in a full-sib F1 family consisting of 120 progenies in Songpu mirror carp (Cyprinus carpio) reared in Northeast China. The consensus map covered 5471.93 centimorgans (cM) across the 50 linkage groups with an average resolution of 0.54 cM. Around the first overwintering period, a total of 15 QTLs for growth traits were identified on five LGs (LG7, LG8, LG14, LG30, and LG41). All of them were responsible for body weight (BW) at both 150 (before overwintering) and 350 (after overwintering) days, explaining 9.8–17.9 % of the phenotypic variation (PV). Two major-effect QTLs were detected on LG30, which explained 16.5 % and 16.4 % (qTL9–30), and 15.8 % and 17.9 % (qTL10–30) of the PV at 150 and 350 days, respectively. In addition, 15 loci were identified for 4 growth traits other than BW at both time points, including 6 loci for body length (BL), 7 loci for body height (BH), 3 loci for body thickness (BT) and 5 loci for head length (HL). These loci explained PV ranging from 15.9 % to 21.2 % for BL, 9.3–19.6 % for BH, 9.6–15.5 % for BT, and 10.6–18.1 % for HL. Nine candidate genes (IGF1b, LEPb, PACAP1a, GHSR1a, PPSS2, IRS1, APOAIb, IRS2b and ADSS1) were identified, five of which were directly related to growth hormone (GH), including IGF1b, PACAP1a, GHSR1a, PPSS2 and IRS1. Notably, three genes (GHSR1a, PPSS2 and IRS1) were derived from two major-effect QTLs (qTL9–30 and qTL10–30). In conclusion, these novel findings may shed light on early selection of growth traits of common carp cultured in Northeast China and other relatively high-latitude regions
Yishen-tongbi decoction inhibits excessive activation of B cells by activating the FcγRIIb/Lyn/SHP-1 pathway and attenuates the inflammatory response in CIA rats
Rheumatoid arthritis (RA) is a chronic autoimmune disease. Strong evidence supports that excessive activation of B cells plays a critical role in the pathogenesis of RA. Fc gamma receptor b (FcγRIIb) is the B cell inhibitory receptor and inhibits BCR (B cell receptor) signalling in part by selectively dephosphorylating CD19 which is considered a co-receptor for BCR and is essential for B cell activation. Our previous study demonstrated that a FcγRIIb I232T polymorphism presented a strong genetic link to RA and may lead to the excessive activation of B cells. Therefore, novel therapeutic strategies and drugs that can effectively inhibit the excessive activation of B cells by regulating the FcγRIIb are necessary for the treatment of RA. Therefore, we used Burkitt’s lymphoma ST486 human B cells (lacking endogenous FcγRIIb) transfected with the 232Thr loss-of-function mutant to construct a FcγRIIb mutant cell line (ST486), and we demonstrated that YSTB treatment not only reduced proliferation and promoted apoptosis in ST486 cells but also did so in a dose-dependent manner. Furthermore, the intracellular Ca2+ flux of ST486 cells was decreased after treatment with YSTB, inhibiting the excessive activation of ST486 cells, and these effects correlated with the CD19/FcγRIIb-Lyn-SHP-1 pathways. Our data showed that YSTB treatment inhibited the expression of phosphorylated CD19 and upregulated the protein expression of FcγRIIb, Lyn, and SHP-1. Additionally, the CIA model was established to explore the anti-inflammatory and inhibitory effects of YSTB on bone destruction, and we found that YSTB decreased the paw oedema and arthritis index (AI) in CIA rats. It is worth mentioning that YSTB clearly decreased the AI earlier than methotrexate (MTX) (day 10 vs 16). Moreover, synovial hyperplasia, inflammatory cell infiltration and cartilage surface erosion in CIA rats were noticeably reduced after treatment with YSTB as evidenced by histopathological examination. Finally, we found that YSTB treatment suppressed bone erosion and joint space score (JNS) in CIA rats as evidenced by radiographic assessment. In summary, these data suggest that YSTB has great therapeutic potential for RA treatment
Genetic Differentiation of an Endangered Megalobrama terminalis Population in the Heilong River within the Genus Megalobrama
Megalobrama terminalis, which inhabits the Sino-Russian Heilong-Amur River Basin, has decreased critically since the 1960s. It has been listed in the Red Book of Endangered Fish Species by the Russian Federation in 2004. To guide the utilization and conservation programs of M. terminalis in the Heilong River (MTH), 3.1 kb of mitochondrial DNA (mtDNA) concatenated sequences and sequence-related amplified polymorphism (SRAP) markers (15 primer combinations) were applied to explore the genetic divergence and population differentiation of MTH within the genus Megalobrama. Clear genetic divergence between MTH and six other populations of the genus Megalobrama was found by haplotype network (mtDNA) and principal component (SRAP) analyses. Moreover, the STRUCTURE analysis based on SRAP data showed that MTH could be assigned to a particular cluster, whereas conspecific M. terminalis in the Qiantang River and Jinsha River Reservoir belonged to the same cluster. Analysis of molecular variance (AMOVA) and Fst statistics for the mtDNA and SRAP data revealed significant genetic variance and differentiation among all detected populations. Taken together, the results suggest that MTH has a strong genetic differentiation from other populations within the genus Megalobrama, which contributes to effective utilization in artificial cultivation and breeding of MTH. Furthermore, these results also provide a scientific basis for the management of MTH as a separate conservation unit
Plant Traits Guide Species Selection in Vegetation Restoration for Soil and Water Conservation
Great efforts have been made to improve the soil and water conservation capacity by restoring plant communities in different climatic and land-use types. However, how to select suitable species from local species pools that not only adapt to different site environments, but also achieve certain soil and water conservation capacities is a great challenge in vegetation restoration for practitioners and scientists. So far, little attention has been paid to plant functional response and effect traits related to environment resource and ecosystem functions. In this study, together with soil properties and ecohydrological functions, we measured the seven plant functional traits for the most common species in different restoration communities in a subtropical mountain ecosystem. Multivariate optimization analyses were performed to identify the functional effect types and functional response types based on specific plant traits. We found that the community-weighted means of traits differed significantly among the four community types, and the plant functional traits were strongly linked with soil physicochemical properties and ecohydrological functions. Based on three optimal effect traits (specific leaf area, leaf size, and specific root length) and two response traits (specific leaf area and leaf nitrogen concentration), seven functional effect types in relation to the soil and water conservation capacity (interception of canopy and stemflow, maximum water-holding capacity of litter, maximum water-holding capacity of soil, soil surface runoff, and soil erosion) and two plant functional response types to soil physicochemical properties were identified. The redundancy analysis showed that the sum of all canonical eigenvalues only accounted for 21.6% of the variation in functional response types, which suggests that community effects on soil and water conservation cannot explain the overall structure of community responses related to soil resources. The eight overlapping species between the plant functional response types and functional effect types were ultimately selected as the key species for vegetation restoration. Based on the above results, we offer an ecological basis for choosing the appropriate species based on functional traits, which may be very helpful for practitioners involved in ecological restoration and management